Fault Tolerance in Transaction Systems
نویسنده
چکیده
We survey two schemes for fault tolerance for diierent fault models. The rst, primary-backup approach deals with disaster recovery. The second, is aimed at developing commit protocols that tolerate commission failures. A remote backup database system tracks the state of a primary system, taking over transaction processing when disaster hits the primary site. The primary and backup sites are physically isolated so that failures at one site are unlikely to propagate to other. For correctness, the execution schedule at the backup must be equivalent to that at the primary. When the primary and backup contain a single processor, it is relatively easy to achieve this property. However, this is harder to do when each site contains multiple processors and sites are connected via multiple communication lines. We will survey algorithms for maintaining backup copies that guarantee correctness and other important properties. Remote Backup systems are typically log-based and can be classiied into 2-safe and 1-safe, depending on whether transactions commit at both sites simultaneously or rst commit at the primary and are later propagated to the backup. We will study algorithms of both types. Often transactions involve sites that are untrusted, e.g., a home banking transaction. The objective is to develop a commit protocol that can tolerate both omission and commission failures of the untrusted sites, and omission failures of the trusted sites. We survey an algorithm which achieves this objective, without trying to resorting to Byzantine agreement protocol. This has advantages in terms of speed and complexity of the system.
منابع مشابه
Facilitating the Design of Fault Tolerance in Transaction Level SystemC Programs
Due to their increasing complexity, today’s SoC (System on Chip) systems are subject to a variety of faults (e.g., soft errors, component crash, etc.), thereby making fault tolerance a highly important property of such systems. However, designing fault tolerance is a complex task in part due to the large scale of integration of SoC systems and different levels of abstraction provided by modern ...
متن کاملFault Tolerance Lessons Applied to Parallel Computing
This paper describes an approach to fault-tolerant parallel computing which is based on the experiences with the most successful fault-tolerant software – the transaction processing systems. The algorithms presented here have less runtime overhead and faster recovery than most preceding approaches. In the Pact parallel programming environment fault tolerance is provided fully user transparent i...
متن کاملWhen Is Operation Ordering Required in Replicated Transactional Storage?
Today’s replicated transactional storage systems typically have a layered architecture, combining protocols for transaction coordination, consistent replication, and concurrency control. These systems generally require costly strongly-consistent replication protocols like Paxos, which assign a total order to all operations. To avoid this cost, we ask whether all replicated operations in these s...
متن کاملDynamic Restructuring of Transactions
Open-ended activities are characterized by uncertain duration, unpredictable developments , and interactions with other concurrent activities. Like other database applications, they require consistent concurrent access and fault-tolerance, but their unconventional characteristics are incompatible with the conventional database mechanisms of concurrency and failure atomicity. We present the spli...
متن کاملAn approach to fault detection and correction in design of systems using of Turbo codes
We present an approach to design of fault tolerant computing systems. In this paper, a technique is employed that enable the combination of several codes, in order to obtain flexibility in the design of error correcting codes. Code combining techniques are very effective, which one of these codes are turbo codes. The Algorithm-based fault tolerance techniques that to detect errors rely on the c...
متن کاملModeling Fault Tolerant and Secure Mobile Agent Execution in Distributed Systems
The reliable execution of mobile agents is a very important design issue in building mobile agent systems and many fault-tolerant schemes have been proposed so far. Security is a major problem of mobile agent systems, especially when monetary transactions are concerned. Security for the partners involved is handled by encryption methods based on a public key authentication mechanism and by secr...
متن کامل